Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding
نویسندگان
چکیده
The prevailing framework for solving referring expression grounding is based on a two-stage process: 1) detecting proposals with an object detector and 2) the referent to one of proposals. Existing solutions mostly focus step, which aims align expressions In this paper, we argue that these methods overlook obvious mismatch between roles in two stages: they generate solely detection confidence (i.e., expression-agnostic), hoping contain all right instances expression-aware). Due mismatch, current suffer from severe performance drop detected ground-truth To end, propose Ref-NMS, first method yield expression-aware at stage. Ref-NMS regards nouns as critical objects, introduces lightweight module predict score aligning each box object. These scores can guide NMS operation filter out boxes irrelevant expression, increasing recall resulting significantly improved performance. Since Ref- agnostic it be easily integrated into any state-of-the-art method. Extensive ablation studies several backbones, benchmarks, tasks consistently demonstrate superiority Ref-NMS. Codes are available at: https://github.com/ChopinSharp/ref-nms.
منابع مشابه
Modeling Collaborative Referring for Situated Referential Grounding
In situated dialogue, because humans and agents have mismatched capabilities of perceiving the shared physical world, referential grounding becomes difficult. Humans and agents will need to make extra efforts by collaborating with each other to mediate a shared perceptual basis and to come to a mutual understanding of intended referents in the environment. In this paper, we have extended our pr...
متن کاملBreaking Security Design Bottlenecks in Payment Technologies
The subject of payment technology is central to e-business applications. Payment technologies' main role is to facilitate transactions by saving time and providing cost effectiveness, value for money, and flexibility for consumers. Payment Technologies for E-Commerce provides brings to light electronic payment systems that serve as the backbone for payments made electronically. The book also em...
متن کاملGrounding Referring Expressions in Images by Variational Context
We focus on grounding (i.e., localizing or linking) referring expressions in images, e.g., “largest elephant standing behind baby elephant”. This is a general yet challenging vision-language task since it does not only require the localization of objects, but also the multimodal comprehension of context — visual attributes (e.g., “largest”, “baby”) and relationships (e.g., “behind”) that help t...
متن کاملGrounding Spatio-Semantic Referring Expressions for Human-Robot Interaction
The human language is one of the most natural interfaces for humans to interact with robots. This paper presents a robot system that retrieves everyday objects with unconstrained natural language descriptions. A core issue for the system is semantic and spatial grounding, which is to infer objects and their spatial relationships from images and natural language expressions. We introduce a two-s...
متن کاملSymbol Grounding for Robot Dialog Research Proposal
Advancements in robotics have led to an ever-growing repertoire of software capabilities (e.g., recognition, mapping, and object manipulation). However, robotic capabilities grow, the complexity of operating and interacting with such robots increases (such as through speech, gesture, scripting, or programming). Language-based communication can offer users the ability to work with physically and...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2021
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v35i2.16188